📰 Fake News Detection Using Machine Learning
🔍 Overview
This project addresses the rising challenge of online misinformation by building machine learning models to identify fake news articles.
We used NLP techniques and multiple classifiers to predict whether a news article is real or fake based solely on its content.
🧭 Approach
We collected and cleaned a labeled dataset of news headlines and bodies. The data was vectorized using TF-IDF and passed through various
supervised learning models to evaluate performance in classifying articles.
⚙️ Methodologies
- Text Cleaning: Removed punctuation, stopwords, and special characters using regex
- TF-IDF Vectorization: Converted text to numerical vectors
- Modeling: Trained Logistic Regression, Naive Bayes, Decision Tree, SVM, Random Forest, and Gradient Boosting
- Evaluation: Accuracy, confusion matrix, precision, recall, F1-score
🧰 Technologies
- Language: Python
- Libraries: Scikit-learn, Pandas, NumPy, Matplotlib, Seaborn
- NLP Techniques: TF-IDF, text preprocessing
- Models: Logistic Regression, Naive Bayes, SVM, Random Forest, Gradient Boosting
💡 Key Learnings
- Gained hands-on experience applying NLP techniques to real-world data
- Learned how to evaluate model performance for high-stakes classification problems
- Understood how different algorithms perform on imbalanced text datasets
📈 Results
Logistic Regression and Gradient Boosting yielded the highest accuracy in classifying fake news,
supported by precision-recall analysis. The model demonstrated robust performance across various test cases,
making it a valuable tool for identifying misinformation.